153 research outputs found
A C-DAG task model for scheduling complex real-time tasks on heterogeneous platforms: preemption matters
Recent commercial hardware platforms for embedded real-time systems feature
heterogeneous processing units and computing accelerators on the same
System-on-Chip. When designing complex real-time application for such
architectures, the designer needs to make a number of difficult choices: on
which processor should a certain task be implemented? Should a component be
implemented in parallel or sequentially? These choices may have a great impact
on feasibility, as the difference in the processor internal architectures
impact on the tasks' execution time and preemption cost. To help the designer
explore the wide space of design choices and tune the scheduling parameters, in
this paper we propose a novel real-time application model, called C-DAG,
specifically conceived for heterogeneous platforms. A C-DAG allows to specify
alternative implementations of the same component of an application for
different processing engines to be selected off-line, as well as conditional
branches to model if-then-else statements to be selected at run-time. We also
propose a schedulability analysis for the C-DAG model and a heuristic
allocation algorithm so that all deadlines are respected. Our analysis takes
into account the cost of preempting a task, which can be non-negligible on
certain processors. We demonstrate the effectiveness of our approach on a large
set of synthetic experiments by comparing with state of the art algorithms in
the literature
Work-in-Progress: NVIDIA GPU Scheduling Details in Virtualized Environments
Modern automotive grade embedded platforms feature high performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications. Hence, a GPU scheduling approach with strong Real-Time guarantees is needed. While previous research efforts focused on reverse engineering the GPU ecosystem in order to understand and control GPU scheduling on NVIDIA platforms, we provide an in depth explanation of the NVIDIA standard approach to GPU application scheduling on a Drive PX platform. Then, we discuss how a privileged scheduling server can be used to enforce arbitrary scheduling policies in a virtualized environment
A Novel Real-Time Edge-Cloud Big Data Management and Analytics Framework for Smart Cities
Exposing city information to dynamic, distributed, powerful, scalable, and user-friendly big data systems is expected to enable the implementation of a wide range of new opportunities; however, the size, heterogeneity and geographical dispersion of data often makes it difficult to combine, analyze and consume them in a single system. In the context of the H2020 CLASS project, we describe an innovative framework aiming to facilitate the design of advanced big-data analytics workflows. The proposal covers the whole compute continuum, from edge to cloud, and relies on a well-organized distributed infrastructure exploiting: a) edge solutions with advanced computer vision technologies enabling the real-time generation of “rich” data from a vast array of sensor types; b) cloud data management techniques offering efficient storage, real-time querying and updating of the high-frequency incoming data at different granularity levels. We specifically focus on obstacle detection and tracking for edge processing, and consider a traffic density monitoring application, with hierarchical data aggregation features for cloud processing; the discussed techniques will constitute the groundwork enabling many further services. The tests are performed on the real use-case of the Modena Automotive Smart Area (MASA)
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Most recent state of the art architectures rely on combinations and
variations of three approaches: convolutional, recurrent and self-attentive
methods. Our work attempts in laying the basis for a new research direction for
sequence modeling based upon the idea of modifying the sequence length. In
order to do that, we propose a new method called "Expansion Mechanism" which
transforms either dynamically or statically the input sequence into a new one
featuring a different sequence length. Furthermore, we introduce a novel
architecture that exploits such method and achieves competitive performances on
the MS-COCO 2014 data set, yielding 134.6 and 131.4 CIDEr-D on the Karpathy
test split in the ensemble and single model configuration respectively and 130
CIDEr-D in the official online evaluation server, despite being neither
recurrent nor fully attentive. At the same time we address the efficiency
aspect in our design and introduce a convenient training strategy suitable for
most computational resources in contrast to the standard one. Source code is
available at https://github.com/jchenghu/explorin
A Perspective on Safety and Real-Time Issues for GPU Accelerated ADAS
The current trend in designing Advanced Driving Assistance System (ADAS) is to enhance their computing power by using modern multi/many core accelerators. For many critical applications such as pedestrian detection, line following, and path planning the Graphic Processing Unit (GPU) is the most popular choice for obtaining orders of magnitude increases in performance at modest power consumption. This is made possible by exploiting the general purpose nature of today's GPUs, as such devices are known to express unprecedented performance per watt on generic embarrassingly parallel workloads (as opposed of just graphical rendering, as GPUs where only designed to sustain in previous generations). In this work, we explore novel challenges that system engineers have to face in terms of real-time constraints and functional safety when the GPU is the chosen accelerator. More specifically, we investigate how much of the adopted safety standards currently applied for traditional platforms can be translated to a GPU accelerated platform used in critical scenarios
API Comparison of CPU-To-GPU Command Offloading Latency on Embedded Platforms (Artifact)
High-performance heterogeneous embedded platforms allow offloading of parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). A time-predictable characterization of task submission is a must in real-time applications. We provide a profiler of the time spent by the CPU for submitting stereotypical GP-GPU workload shaped as a Deep Neural Network of parameterized complexity. The submission is performed using the latest API available: NVIDIA CUDA, including its various techniques, and Vulkan. Complete automation for the test on Jetson Xavier is also provided by scripts that install software dependencies, run the experiments, and collect results in a PDF report
Novel Methodologies for Predictable CPU-To-GPU Command Offloading
There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms.
This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems.
Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity
Environmental conditions in river segments intercepted by culverts
The conservation and maintenance of the quality of the rheophilic environment are directly related to knowledge of the physical and chemical characteristics and structural patterns of these systems, especially in streams. Long stretches of small water bodies are highly altered by the construction of highways and roads, which tend to modify their natural characteristics, affecting the environmental quality. This study describes vegetation and morphogeometric parameters of streams with culverts along their courses, reporting spatial differences in environmental characteristics (vegetation, morphogeometric, physical, and chemical) between sampling points upstream and downstream of the culvert. Specifically, we evaluated the width, depth, riparian vegetation, substrate background, and physical and chemical properties of the water, to identify possible differences between the sections above and below (upstream and downstream) of the culvert. The rapid assessment protocol (RAP) was applied to stretches of 200 meters upstream and downstream of culverts in two Neotropical streams, between the months of November 2009 and October 2010. The vegetation and morphogeometric attributes differed between the portions upstream and downstream of the culverts, the latter because of the impoundment effect of these structures. The upstream section becomes flooded, is often shallow, and directly influences the movement of sediment. The physical and chemical variables of the water showed no spatial variation.(Condições ambientais de segmentos fluviais interceptados por bueiros). A conservação e a manutenção da qualidade ambiental do ambiente reofĂlico está diretamente relacionada ao conhecimento de caracterĂsticas fĂsicas e quĂmicas e dos padrões estruturais destes sistemas, especialmente em riachos. Longos trechos de pequenos corpos aquáticos sĂŁo altamente alterados pela construção de rodovias e estradas e tende a modificar as suas caracterĂsticas naturais, interferindo na qualidade ambiental. Neste sentido, o objetivo deste estudo foi descrever parâmetros fito-morfogeomĂ©tricos de riachos com bueiros em seu curso longitudinal, reportando diferenças espaciais nas caracterĂsticas ambientais (fito-morfogeomĂ©tricos e fĂsico-quĂmicas) entre os pontos amostrados (montante e jusante do bueiro). Especificamente, avaliamos a largura, a profundidade, vegetação ripária, substrato de fundo e atributos fĂsicos e quĂmicos da água, verificando as possĂveis divergĂŞncias entre os trechos de acima e abaixo (montante e jusante) do bueiro. Para isso, o protocolo de avaliação rápida (PAR) foi aplicado em trechos de 200 metros a montante, bem como a jusante de bueiros em dois riachos neotropicais entre os meses de novembro de 2009 e outubro de 2010. Verificou-se que os atributos fito-morfogeomĂ©tricos diferem entre os trechos de montante e jusante, pois o bueiro tem efeito de represamento. Esse fato transforma o trecho a montante em ambiente alagado, muitas vezes rasos e influenciando diretamente o movimento de sedimentos. As variáveis fĂsicas e quĂmicas da água nĂŁo apresentaram variação espacial
Morphology change in nematic membranes induced by defects
The cell membrane is one of the most important structures of living organisms. This is due to the many functions attributed to it such as permeable selectivity, protection, anchoring to the cytoskeleton and so many others. Any change in the shape of the cell membrane may affect directly the properties and abilities. In this article, we study how defects in the liquid crystalline organization of a membrane can change its shape. For performing this, we consider a membrane with orientational order, i.e., a nematic membrane, which can happen in biological membranes, nematic films and other systems and study how a defect in this order can change the shape of the membrane when the bending rigidity is considered. We find that depending on the ratio of rigidity and elastic constant, buckling of this membrane may happens and turn it into pseudo-spheres
- …